51 research outputs found
Bridging statistical learning and formal reasoning for cyber attack detection
Current cyber-infrastructures are facing increasingly stealthy attacks that implant malicious payloads under the cover of benign programs. Current attack detection approaches based on statistical learning methods may generate misleading decision boundaries when processing noisy data with such a mixture of benign and malicious behaviors. On the other hand, attack detection based on formal program analysis may lack completeness or adaptivity when modeling attack behaviors. In light of these limitations, we have developed LEAPS, an attack detection system based on supervised statistical learning to classify benign and malicious system events. Furthermore, we leverage control flow graphs inferred from the system event logs to enable automatic pruning of the training data, which leads to a more accurate classification model when applied to the testing data. Our extensive evaluation shows that, compared with pure statistical learning models, LEAPS achieves consistently higher accuracy when detecting real-world camouflaged attackswith benign program cover-up
Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems
Due to the increasing usage of machine learning (ML) techniques in security-
and safety-critical domains, such as autonomous systems and medical diagnosis,
ensuring correct behavior of ML systems, especially for different corner cases,
is of growing importance. In this paper, we propose a generic framework for
evaluating security and robustness of ML systems using different real-world
safety properties. We further design, implement and evaluate VeriVis, a
scalable methodology that can verify a diverse set of safety properties for
state-of-the-art computer vision systems with only blackbox access. VeriVis
leverage different input space reduction techniques for efficient verification
of different safety properties. VeriVis is able to find thousands of safety
violations in fifteen state-of-the-art computer vision systems including ten
Deep Neural Networks (DNNs) such as Inception-v3 and Nvidia's Dave self-driving
system with thousands of neurons as well as five commercial third-party vision
APIs including Google vision and Clarifai for twelve different safety
properties. Furthermore, VeriVis can successfully verify local safety
properties, on average, for around 31.7% of the test images. VeriVis finds up
to 64.8x more violations than existing gradient-based methods that, unlike
VeriVis, cannot ensure non-existence of any violations. Finally, we show that
retraining using the safety violations detected by VeriVis can reduce the
average number of violations up to 60.2%.Comment: 16 pages, 11 tables, 11 figure
Development of a Neural Network-Based Mathematical Operation Protocol for Embedded Hexadecimal Digits Using Neural Architecture Search (NAS)
It is beneficial to develop an efficient machine-learning based method for
addition using embedded hexadecimal digits. Through a comparison between
human-developed machine learning model and models sampled through Neural
Architecture Search (NAS) we determine an efficient approach to solve this
problem with a final testing loss of 0.2937 for a human-developed model
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Language models have outpaced our ability to evaluate them effectively, but
for their future development it is essential to study the frontier of their
capabilities. We consider real-world software engineering to be a rich,
sustainable, and challenging testbed for evaluating the next generation of
language models. We therefore introduce SWE-bench, an evaluation framework
including software engineering problems drawn from real GitHub issues
and corresponding pull requests across popular Python repositories. Given
a codebase along with a description of an issue to be resolved, a language
model is tasked with editing the codebase to address the issue. Resolving
issues in SWE-bench frequently requires understanding and coordinating changes
across multiple functions, classes, and even files simultaneously, calling for
models to interact with execution environments, process extremely long contexts
and perform complex reasoning that goes far beyond traditional code generation.
Our evaluations show that both state-of-the-art proprietary models and our
fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and
GPT-4 solve a mere % and % of instances respectively, even when
provided with an oracle retriever. Advances on SWE-bench represent steps
towards LMs that are more practical, intelligent, and autonomous.Comment: Data, code, and leaderboard are available at https://www.swebench.co
Symmetry-Preserving Program Representations for Learning Code Semantics
Large Language Models (LLMs) have shown promise in automated program
reasoning, a crucial aspect of many security tasks. However, existing LLM
architectures for code are often borrowed from other domains like natural
language processing, raising concerns about their generalization and robustness
to unseen code. A key generalization challenge is to incorporate the knowledge
of code semantics, including control and data flow, into the LLM architectures.
Drawing inspiration from examples of convolution layers exploiting
translation symmetry, we explore how code symmetries can enhance LLM
architectures for program analysis and modeling. We present a rigorous
group-theoretic framework that formally defines code symmetries as
semantics-preserving transformations and provides techniques for precisely
reasoning about symmetry preservation within LLM architectures. Using this
framework, we introduce a novel variant of self-attention that preserves
program symmetries, demonstrating its effectiveness in generalization and
robustness through detailed experimental evaluations across different binary
and source code analysis tasks. Overall, our code symmetry framework offers
rigorous and powerful reasoning techniques that can guide the future
development of specialized LLMs for code and advance LLM-guided program
reasoning tasks
- …